DOCUMENT RESUME 



ED 065 605 

AUTHOR 

TITLE 

PUB DATE 
NOTE 



EDRS PRICE 
DESCRIPTORS 



TM 001 875 

Block, James H. 

Student Evaluation: Toward the Setting of Mastery 
Performance Standards. 

4 Apr 72 

28p. ; Paper presented at the annual meeting of the 
AERA (Chicago, 111., April 4, 1972) 

MF-$0.65 HC— $3. 29 

♦Academic Achievement; ♦Cognitive Processes; Data 
Collection; Evaluation Techniques; ^Feasibility 
Studies; Learning Processes; Models; Objectives; 
♦Performance Criteria; Standards; ♦Student 
Evaluation; Task Performance; Test Results 



ABSTRACT 

When the task of evaluating student learning is 
carefully considered, two major problems emerge. One is the gathering 
of the most appropriate and precise evidence possible about the 
learning. The other is the setting of performance standards against 
which this evidence may be weighed and the adequacy of each student's 
learning judged. This paper has focused on the problem of setting 
performance standards for use in strategies for mastery learning. The 
paper began with the argument that a key variable ir the design of 
these strategies are the mastery performance standards which students 
are helped to attain throughout their instruction. It was pointed out 
that presently there are no procedures for setting such standards. 
Next, an attempt was made to formulate one such procedure. The 
approach developed utilizes students' future learning, i.e., their 
scores on a set of desired, end- of -ins true tion learning outcomes, as 
a criterion for determining the mastery performance level which 
students must attain at any stage in their instruction. Finally, the 
paper reported an experiment designed to explore the feasibility of 
the approach proposed. The experiment was designed to test the 
assumption that the performance standard which a student attains over 
each segment of his instruction has important implications for his 
realization of the desired, end-of-instruction learning outcomes. In 
general, the experiment' s results confirmed the assumption tested. 
(Author/CK) 



O 



ED 065605 




fc 



U.S. DEPARTMENT OF HEALTH, 
EDUCATION & WELFARE 
DFFICE OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRO- 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION orig- 
inating IT. POINTS OF VIEW OR OPIN- 
IONS STATED DO NOT NECESSARILY 
REPRESENT OFFICIAL OFFICE OF EDU- 
CATION POSITION OR POLICY. 





STUDENT EVALUATION : TOWARD THE SETTING 

OF MASTERY PERFORMANCE STANDARDS 



James H. Block 

• University of California , Santa Barbara 



in 

N 

oo 

iH 

O 

O 



ABBA 

Chicago, Illinois 
April 4, 1972 



FILMED FROM BEST AVAILABLE COPY 



I 

n 



1 



Although the problem of setting performance standards is old, perhaps 
never has it assumed greater importance than in tins design of mastery 
learning strategies, These strategies are designed on the assumption that 
attainment of particular standards throughout the instructional process 
will help a maximum number of students reach desired end- of- instruction 
learning outcomes. Without procedures for selecting standards whose 
maintenance produces the desired outcomes, therefore, these strategies 
cannot be consistently well-designed,, 

Presently there are no such procedures. In part this is due to the 
type of standard which must be set. While there are well- developed pro- 
cedures for setting relative (e.g., Angoff, 1971) or absolute (e.g., 



Wedelshy, 195*0 standards for use in interpreting scores on norm- 
referenced tests, mastery performance standards must be absolute 
standards for use in interpreting scores on c r l terlon-r ef erenc e d tests 
(Block, 1971; Bormuth, 1971). 1 

In larger part, though, the lack of sound procedures for setting 
mastery performance standards is due to how these standards must be set. As 
Bormuth (1971) lias argued the setting of mastery performance standards requires 
rational techniques which are capable of yielding standards whose superiority 



1 

For purposes of this paper, a norm-referenced test may be defined as 
an instrument designed to indicate how well the student has 3.earned a 
given segment of instruction relative to his peers while a criterion- 
referenced instrument may be defined as one designed to indicate what 
the student has or has not 3.eamed from the segment, 

z 



o 



as indices of the adequacy of a student’s learning can he defended both 
logically and empirically. 

But presexit techniques for setting mastery standards are essentially 
irrational and yield indefensible standards. For example., perfect perform- 
ance, i.e., a perfect test score over a set of instructional objectives, 
is of lien set as a mastery standard. let data from both laboratory learning 
research (Bormuth, 1971) and studies •which employ perfect performance as 
their mastery standard (e.g., Sherman, 19^7) suggest that perfect perform- 
ance is unrealistic to expect and prohibitively expensive to attain. Or 
to take another example, mastery standards! are also often set in terms of 
achievement of some fixed proportion, say 85 percent, of a set of objec- 
tives. But why this proportion is a more meaningful index of mastery than, 
say, achievement of 80 or 9° percent of the objectives is rarely explained 
or defended. 

If the current trend toward increased use of mastery learning strate- 
gies continues, then the need for some better approaches to the problem of 
setting mastery performance standards is obvious. This paper attempts to 
begin to fulfill this need by (a) formulating a general approach for the 
setting of defensible mastery standards and (b) testing the approach’s 
feasibility. 

An Approach for Setting Mastery Performance Standards 

HI ■«. m WMH — ■ WI— IHM I II ifT 1 I i.MiJ >»Hin,.. <i .w u M .wUw 

According to Cahen (197.0) one way to assess the learning outcomes of 
any instructional segment is to examine how well that segment has prepared 
the student for future learning. This idea has major implications for the 
setting of mastery performance standards for two reasons. First, in mastery 



3 



strategies the major outcome over any instructional segment is the perform- 
ance level to "which each student learns* And second, these strategies are 
explicitly designed such that the level to which students are helped to 
learn should maximize each student's likelihood of attaining a set of de~ 
sired, end-of- instruction learning outcomes. 

In a mastery learning context, therefore, Cahen's notion can be trans- 
lated as follows : one way to assess mastery over any segment of instruc- 

tion is to examine how well the attainment of various performance levels 
prepares students for attaining the desired learning outcomes. That level 
which best prepares students vis-a-vis these outcomes can then be selected 
as one's mastery standard. For example, one could set a mastery performance 
standard for a tiro-unit instructional sequence where achievement and inter- 
est were the desired learning outcomes by determining that performance 
level whose attainment best maximized these outcomes* 

Three major steps must he taken to implement this approach* First, 
the learning outcomes to be maximized by the instruction’s completion must 
be characterized by a set of defensible learning criteria* This entails 
the creation of some practical (Schwab, 1968) methodology for making justi- 
fiable value- judgments . The method must be practical because we have no 
comprehensive theory of the outcomes of instruction (Bormuth, 1971; Gagne", 
1970) and hence no theory which might guide one in selecting among existent 
criteria or generating new criteria for representing the learning. The 
method must make value- judgments, i*e. , the selection of some subset of 
criteria from the range of possibilities available according to some 
priorities , because the choice of criteria represents essentially a value 
or values judgment (Scriven, 1967; Messick, 1970). 




4 



Next, some model for interrelating, 



weighting and combining scores 



over the various criteria must be developed (Bormuth, 1971) » This model 
has two functions. First, it attempts to capture the wholeness :and the 
complexity of the learning to be maximised. Second, it provides a decision 
function for selecting that standard which best maximizes the learning in 

t 

cases where the attainment of different performance levels maximises scores 



on different criteria. Suppose, for example, that the learning bo be maxi- 
mized is represented by two criteria, achievement and interest, and that 
the attainment of one performance level maxiraises future achievement while 



the attainment of another maximises future interest. Depending 'upon one’s 
model, the standard would be set either closer to the first p 3 rf 0 rman.ce 
level or closer to the second. If achievement plays a far larger role than 
interest in the model, then the standard would be set closer to the first 
level. But if interest plays a far larger role than achievmsnt, then the 
converse would be true. 

Finally, having defined the learning to be maximised by a set of de- 
fensible criteria and a model which incorporates these criteria, maximal 
learning must be clearly defined. Here a statistical technique must be 
selected for estimating future learning (i.e, 3 estimating scores from the 
model) as a function of the performance level to which the unit or units 
over which the standard will be set are learned. Least-squares (e.g., 
regression), Bayesian or other estimation procedures might be used. That 
level which yields the greatest estimated .future learning can. then readily 
selected as one’s mastery standard. 



Advantages of the Proposed Approach 



While this approach to the setting of mastery standards is neither 
as simple or as expedient as its predecessors, it does have some; powerful 



5 



advantages . First , it introduces a heretofore missing element of objec- 
tivity into the process of setting mastery standards, from the choice of 
learning criteria to the selection of some statistical estimation technique, 
one is forced to he explicit about the decision processes by which he ar- 
rives at his standards. Thereby he opens his standard setting process to 



scrutiny and challenge by other individuals and enables ohese individuals, 
if they so choose, to independently verify his standard through replication. 

Second, the procedure yields standards Tdaich have clear meaning for 
student learning. It enables one to set a standard whose attainment should 
lead to greater future learning than would have the attainment of any other 



standard. 



And third, it allows one to optimally design his mastery learning strat- 
egy. As Glaser and Mtko (1971) have pointed out, "Instruction proceeds as 
a function of the relationship among measures of student performance, avail- 
able instructional alternatives and learning criteria that are chosen to be 
optimised . 11 The approach developed here establishes a clear relationship 
among these variables. It forces ore to select particular learning criteria 
to be optimised and to set standards whose attainment will clearly optimize 
this learning. One can then select from the available instructional altern- 
atives the particular design which ensures that the standard selected will 
be maintained and nence that the desired end- of instruction, learning out- 
comes will be reached. 



A ^ys ibillty Study 

The mastery standard setting approach outlined above rests upon many 
assumptions. But perhaps the most basic of these assumptions is that the 
performance standard which a student is helped to attain over each segment 
of his instruction has important implications for his realization of the 
desired end- of- instruction learning outcomes. This segment of the paper 

reports the results of a study designed to test this assumption and,. hence, 

ERIC the approach's feasibility. 



6 



6 






Method 

Subjects : Ninety-' one eighth graders from a lover-fiddle class suburb 
near Portland. Oregon formed the sample. 

Learning Materials : Matrix algebra, was selected as the subject matter 
for this study because it. best fit the following requirements. 

First, it is taught sequentially, that is, each segment of 
instruction builds upon the prior segment. If the .performance 
level which a student is helped to attain over each segment of 
his instruction does influence his future learning, then it 
seemed reasonable that these effect £ would be clearest in 
sequentially taught subjects. Second, the algebra was sufficiently 
relevant to the student’s prior learning so that it would not he 
perceived as being so difficult that only a few could learn it 
or so easy that it would be sheet busy work-, Finally , the algebra 
was sufficiently esoteric to ensure no spill over of any negative 
experimental effects into the students other school work. 

Three programmed units in elementary matrix, algebra were 
developed from a textbook developed by Fnushan (1968): Unit I «« 

The definition and some properties of matrices 5 Unit II •« Special 
types of matrices and the rules of matrix equality; and Unit III ~ 
the process and rul.es of matrix addition and the process of matrix 
subtraction. Each unit was constructed so that most students would' 
learn only about 50 percent of the material, from the text alone. 

Lea rning Criteria. : Common school learning criteria were used to represent 
the future learning to be inaximized, i„e., the goals of the instruc- 
tion, The first criterion was Achievement . This is the criterion 
most often used by schools to measure a student’s learning. It 
indicates his aequistion of the intellectual skills (content and 
mental processes) taught and also serves much like aptitude and 
intelligence measures as an index of general learning capacity . 

Achievement, however, may be thought of a indexing only 
the level to which a student has learned. Bloom (.1968) , drawing 
on the work of Carroll. (3.963), has suggested that the rate at 
which a pupil learns to a given level or the level to which he 
learns in a given amount of time are inter-changeable l.earning 
criteria. Hence, although some have challenged the utility of 
rate measures (e.g., see Croribaeh and Snow, 1969)5 
Needed to Learn was chosen as a second criterion. 



the Time 



A third criterion was Tra nsfer where transfer was defined as 
the application (Bloom, 19567 ” of cognitive skills achieved ‘under 
one set of conditions to the solution of re 3 .ated new problems . 
This criterion ms selected since neither a high level of achieve- 
ment or s. quick learning rate guaranteed that the student would 
be able to apply the skills acquired by one point in time at a 
future point. Jfejeh of school learning, though, is cumulative 
precisely in the sense that what is learned at one point must be 
applied at some later point to faci 3 J.te.te new learning (Gagne, 
1965) or to solve new problems (Brownell, 19 ^ 8 ), 



O 

ERIC 



7 



TV 



If many of the ah ill a acquired in. school are to transfer to 
new learning or to the solution of new problems , then clearly 
these skills must he awa.ilc-.ble when needed. Bat even if a 
student's learning is adequate at its completion, it need not 
he adequate, even available, when required fe,g., see Brownell, 
1948). Hence, R etenti on was selected as a fourth criterion,. 

All the preceding criteria may he classified as "cognitive” 
learning criteria. Many teachers and educational researchers 
(e.g., Brown, 1971; Kr&thwohl, Bloom and lb. sis, 1964; Messick, 
1970) would assert, however, that learning is s„ phenomena 
requiring both cognitive and affective criteria to capture its 
complexity. Accordingly , the following affective, criteria were 
also chosen: Interest in and Attit ud e toward the algebra, both 
at and two weeks after the completion of its learning. These 
criteria were selected for the following reasons. First, 
unlike many affective traits (e.g., values ) interests and 
attitudes might be developed, in the brief period over which 
the experiment ms to take place „ Second, unlike most affective 
traits, interests and attitudes can be measured in at least some 
crude ways (Shaw and taught, 1967). 

Following Getzels (1969), an interest ms conceived as a 
characteristic dispostion of the individual organised through 
experience which induces the individual to actively seek out 
particular activities , skills and understandings associated 
with the object of "affect. ~In the case, of "this study, interest 
ms defined in terms of the individual 5 s willingness to learn 
more about the experimental subject and to participate in a 
number of subject-related activities. An attitude, on the 
other hand, was conceived as an emotional tendency, organized 
through experience to act in a characteristic positive or 
negative my toward the Qbj.£i v .t :of affect. In Get j’els 1 scheme 
the formation of an interest is assumed to be prerequisite 
for the formation of an attitude toward a topic or subject. 

Instruments ; Three parallel forms of twenty- item formative criterion- 
referenced evaluation instruments were prepared for each unit 
follov&ng procedures outlined by Airasian (1969), Bloom, Hastings 
and Madaus (l97l).-> and Bornsuth (.1970) . These instruments would 
be used to determine each student’s unit performance level. 

Instruments were also developed for each of the learning 
criterion. Two forms of a twenty- item sranmative evaluation 
instrument (Bloom, Hastings, and Madaus, 1971 ) were developed 
to test achievement and retention respectively. A ten-item 
transfer test was devised to test the student's ability to 
apply some of the major lavs of matrix algebra - e.g., the 
commutative (A+B B+A) and the associative A* (B+C) - (A+3) +C. 
Finally, Likert-type scales ‘were developed to measure student 
interest in and attitude toward the algebra. Interest was 
measured by a scale designed, to elicit the student's desire to 
learn more about various facets of the algebra and to participate 



O 

ERLC 



8 



8 



in certain activities involving matrix algebra. The attitude scale 
was adapted from the International Study of Educational Achievement 
in Mathematics (liusen, 1967) "Attitude toward Mathematics' 1 subscale. ' 
Experimental Procedure s : Tne experiment •was performed over one school reek 
under actual school conditions: four sessions of 80 minutes for all 
students and one session of 4o minutes for student;? vrho needed, mo re 
learning and testing time. At the first session, protests of ach i.evo 
meat , transfer, interest and attitude were admini stored followed by 
Unit I. Units II and III were given in the second and third sessions 
respectively. Post-tests of achievement, transfer, interest s.:.d 
attitude were administered 'beginning with session four . Two weeks 
after session four, the retention measure ms given and the interest 
and attitude instruments were re- admini st er e d , 

Within each of four classes, students were assigned to one of 
five treatments. Sixteen students were assigned to four experimental 
treatments where each treatment helped Ss to learn to a different 
proportion - 65, 75, 85, 95 per cent - of the material in each unit 
before proceeding to the next „ The remaining students were assigned 
to a control treatment wherein Ss were not required to maintain any 
particular per unit performance level. 

The control end experimental treatments for each unit can he 
schematized as follows; 



Unit Unit ” ” | Self-directed. 

Test Formative 1 Review 
Test | 

< Required — > j j 

Control c Experimental 



Parallel " • Tutoring 
.Review * 

Test- items 5 

~ — As Hecessgyy- * — 



Parallel 

Review 

Test-itrms 

> 



Experimental Only 



In the control treatment, Ss completed the unit programmed textbook, 
completed the unit test and then, regardless of their score on the 
unit test, worked on specially assigned homework. In the experimental 
treatment, Ss completed the unit text and test and then, depending on 
their score on the test, either moved to the homework or reviewed 
portions of the unit. If the S had attained Ms required performance 
level as indicated by his score on the unit test, he worked on the 
homework . If not, he reviewed ^ust enough of the 'unlearned material 
to bring his performance to standard. Special programmed review 
materials and individual review prescriptions keyed to these materials 
were provided. The student could review as much or as little as he 



1 



The reliability indices for each learning criterion instrument were : 



Achievement = 
Retention = 
Transfer «= 
Interest = 
Attitude = 
These coefficients 



.84 (Kuder Richardson Formula 21 ) 

.81 (" n “ " ) 

.89 (" " " " ) 

.92 (Odd-Even, Split-Half) 

.89 (" " " " ) 

are based on a sample size of n = 25. 



O 

ERLC 



a 



filmed from best available copy 



felt necessary. Upon completion of Ms review., he was then retested 
over the reviewed material with new items drawn from the second 
parallel form of the unit test,. If he answered all these items 
correctly, he ms allowed to work on the special homework. If not, 
he was tutored over the material still unlearned and then retested 
over this material with items drawn fr-on & third parallel form of 
the unit test. Pilot testing had shown t hat this review/ correction 
process would guarantee that virtually all experimental Ss could 
he helped to reach their* required performance level. ~ 

Data Gathered ; In addition to the pretest, post-test and retention data, 
the following information ms gathered, First, each student 1 s unit 
performance level before any review/ correction „ Second, each 
student’s unit performance level after revlew/corrsction , if any. 
Third, the time sweat per unit by each student in learning via the 
textbook and any self- directed review and tutoring. And fourth, 
each student’s interest in and attitude toward the algebra at the 
completion of each unit. 

Data Analysis : Across the four classes, a total, of 3.6 students were 

assigned to each of the four experimental treatments and 27 students 
to the control group. However, five experimental and two control 
students who began the experiment failed to complete it. Eight 
other eawerimental students completed the espexlaisat, but were 
dropped for purposes of data .analysis because they consistently 
exceeded their recprlyed performance level (i.e,, learned 10 per 
cent or more material than required) or they consistently failed 
to attain it (l.e., learned 3.0 per cent or more less material 
than required) „ Consecracmtly s data were analysed far only 25 
subjects in the control group, 12 in the 65 per cent experimental 
group, in the 75 and 85 per cent groups each, end 11 in the 
95 per cant group. 

The data were analysed as follows. First, the mean- scores 
y;b&L<ks& by each treatment vers plotted to investigate the general 
nature of the relationship between the performance level maintained 
and student l.ech’Mng as indexed by each criterion. Second, the 
scores on each criterion measure wore analysed using one-way 
dnivariate analysis of variance procedures (Bock, 1963)° The 
least~ square estimated effects generated in these analyses and 
the.-estimates.V standard errors were then used to compare and 
contrast the effects of the various treatments on each criterion. 



Results 

Achievement and Retention 

As indicated in Figure 1, there was a linear relationship between 



« <u *“J 



(M tm K. m «« m w 



Insert Figure 1 

«J*> W H.WWI»I^Uktlrt *A) (d Ift* ft* M> 



o 

ERIC 



10 



TW 



10 



5 


the per unit performance level maintained over the sequence and means scores 
on the achievement and retention measures „ Only maintenance of the 89 and 

1 

95 percent levels, however, • yielded scores which were significantly higher 
Achievement: = 2.93, tg,- - 1.93 5 retention: t^ r . = 3-01, tg,. - .1,9-0 

than the control group* s scores. 

Besides suggesting that there was some relationship between the main- 
tenance of particular performance levels and the mean level of student 
achievement, the data also indicated an interesting relationship between 
the mintenan.ce of the various levels and the variability in student achieve- 

«HMMdMuNT% V«»14«aMOfM^ iMlhU 

ment. Table 1 reports the mean achievement test scores and. the variance of 

2 

these scores for each treatment group . Note that as the per unit perform- 
ance level S_s were asked to maintain increased, mean achievement test scores | 

i 

Ui »*,, W W W I* w tit l*> <w> <*• tn> 


* 


Insert Table 1 • 

(II W W tff H M» W »W N t*l *'» 


\ 

\r 


rose and the variance of these scores fell. The 85 and 95 percent treat- ’• 

ment s not only helped students aohiove to significantly higher levels than 
the control treatment, but it also helped homogenize student performance 
around these high levels. j 

Transfer 

i 

The mean scores of each treatment group on the transfer test ere 

i 

plotted in Figure 2. Here there ms no linear relationship between the 

! 

! 


t 


i 

t 

^ j 

p < ,05. All hypotheses in the study were tested at the .05 level. 1 

^Since the mean achievement scores do not approach the test’s celling, 
these variances are not articicially restricted. 



> 0 


>*« 


1 ERIC 


) 



11 



11 



p 



trt u« t*» «hi ** w* 



Insert Figure 2 



W* «D 4K* t« 



tO M* «)■ W 



o 

ERIC 



performance level maintained and mean scores due primarily to the very 
low mean score of 75 percent group. Farther, only maintenance of the 95 
percent standard yielded significantly higher (transfer: t = 3-02, p < .05) 

scores than the control treatment. 

No completely satisfactory explanation can he given for ths low mean 
transfer score of the 75 percent group, The score might he an artifact 
of the small sample size and the small number of items (10) in the trans- 
fer measure. It might also he attributable to the relatively negative 
interest in and attitude toward the algebra which, as will be shown shortly, 
the 75 percent treatment group exhibited after the sequence’s completion. 
The latter explanation is less tenable then the former, however, because 
the achievement and transfer measures were given together; yet the mean, 
score of the 75 percent group on the achievement measure was not so 
adversely affected. 

Learning Time and Efficiency 

Figure 3 illustrates the average total time spent in learning by the 



Vtt tft W H, M> M W 



M ft IM Wl ta R) 



Insert Figure 3 



(•*<«» 06 



various treatment groups. As 3.3 clear from this figure there was a curvi- 
linear relationship between the performance level maintained and the aver- 
age total learning time spent. The AITOVA result Indicates that all experi- 
mental groups spent significantly more (p < .05) learning time than the 
control group. 



12 



12 



Bat one aspect of these time data warrants further analysis Note 
that the 75 . 85 } and 95 percent treatment groxips all spent approximately 
the same total amount of learning time despite the fact that the 95 per- 
cent group had to learn more material them the 35 percent group and the 
latter, more material than the 75 percent group# This situation could 
have occurred only if the 95 percent goup learned more efficiently, :Le, 
learned more material in a given time, than the 85 percent group and the 
85 percent group learned more efficiently tlian the 75 percent group# 

To explore this "efficiency" hypothesis, the total learning time 
ms broken into the time spent in textbook learning and the time spent 
in correction/ review for each unit. The analysis focused on the relation- 
ship between the average amount of material learned using only the unit 
text and the time spent in that learning. Table 2 partially summarizes 
this relationship. 



Insert Table 2 

Note that by Unit III, students in the 95 percent group were spend* 
ing much less textbook learning time than the other experimental groups 
and rougly the same time as the control group. Further, by Unit III they 
vrer© also learning more material, as evidenced by their average formative 
test score, than any other’ experimental group and roughly ho percent more 
material than the control group. Taken together, therefore, these find- 
ings suggest that maintenance of the 95 percent level eventually helped 
make these students 1 learning more efficient than the learning of both 
the control and the other experimental groups. 




-13 



Interest and Attitude 

■ I W I H IM W I *—■■»!■ n ia ff . li M tfianwww mrn»im 

Figures 4 and J? are plots of tbs mean secures for each treatment group 

M ** m Wt MV Ml M **J Ok M» t») «M W W» %* 

Insert Figures 4 and 5 

on the interest and attitude measures. Figure 4 presents scores on the 
measures administered with the achievement and transfer instruments^ 

Figure 5 presents scores on the measures admini stored with the retention 
instrument. Hereafter, interest in and attitude toward the algebra measured 
just after its completion will be called "short-term' 1 interest and attitude 
respectively. Similarly, interest in and attitude toward the algebra two 
weeks after its completion mil be called "long term" interest and atti- 
tude respectively. 

If the scores of the 75 percent group are disregarded, then in both 
Figures 4 and 5 there was a curvilinear relationship between the per unit 
performance level maintained and mean scores on each criterion. In all 
cases except short-term interest, the scores increased as a function of 
the level maintained up to the 85 percent level and then dropped off at 
the 95 percent level. This pattern is especially apparent in the case of 
long-term attitude. 

3!he AITOVA analyses yielded the following results. On the short-term 
interest, short-term attitude and long-term interest criterion, bath the 
85 or 95 percent treatments yielded significantly greater (p < .05) scores 
(short-term interest: t„ = 1.98, tg^. = 1.84, short-term attitude: t^ = 
2.47, tg^ = 2.835 long-term interest: t^ « 1.97 > tg,. = 2.34) than the 



o 

ERIC 



14 



Ik 



control treatment „ 



cent treatments -was 



But the difference in 
statistically insigni: 



I; 1- ’ 



lie streets of the 8, l > and 95 ter 
'leant. On the long-oerm atti- 



tude criterion, however, 
cantly greater (p < .05) 



only the 85 percent treat-ne: 
scores (long-term attitude: 



vi t yielded signifi- 




3.d2) than 



the control treatment. 



Discussion 

While all these findings must he interpreted cautiously until repli- 

-j 

cated with a larger sample on a longer learning sequence, on uhe whole 
they clo suggest that the standard setting approach proposed here is feasible. 
The maintenance of particular performance levels throughout the instruction 
did influence the students* future learning as characterized by the selected 
3.earning criteria. Further, the maintenance of different levels had differ- 
ent effects on the learning. In particular, maintenance of the $5 percent 
level best maximized the learning represented by the cognitive criteria 
while maintenance of the 85 percent level best maximized the learning repre- 
sented by the affective criteria. Given e. model for relating scores on the 
cognitive criteria to scores on the affective criteria, therefore, it 
would have been possible to set a mastery standard for the algebra sequence. 



Summary 

When the task of evaluating student learning i3 carefully considered, 
two major problems emerge. One is the gathering of the most appropriate 
and precise evidence possible about the learning. The other is the setting 
of performance standards against which this evidence may be weighed and 
the adequacy of each student’s learning judged. This paper has focused on 



\ replication study is currently under way. 



15 



but one facet of the latter problem, via. , the problem of setting perform- 
ance standards for use in strategies for mastery learning » 

The paper began with the argument that a hey variable in. the design of 
these strategies are the mastery performance standards which students are 
helped to attain throughout their instruction. Unless standards can be set 
whose maintenance does produce the desired end- of- instruction learning out- 
comes, then these strategies will ha luiadesigaed. 

It was then pointed out that presently there are no procedures for set- 
ting such standards. Two reasons ware given for this situation. First, 
mastery performance standards must be absolute rather than relative stan- 
dards for use in interpreting scores on criterion- referenced rather than 
norm-referenced testing instruments. And second, mastery standards must 
be set using rational as opposed to arbitrary or irrational, techniques. 

These -techniques should yield standards whose superiority as indices of 
the adequacy of a student ' s learning can be defended both logically and 
empirically. 

Having established the need for rational procedures for setting mastery 
standards, next an attempt was made to formulate one such procedure. The 
approach developed utilises students 5 future learning, i.e,, their scores 
on a set of desired, end- of- instruction learning outcomes , as a criterion 






’or determining the mastery performance level which students must attain at 
any stage in their instruction. To apply the approach, the following steps 
mat be taken. First, the future learning of interest must be specified 
in terms of a set of defensible learning criteria. Ifexb, a model for inter- 
relating, weighting and combining scores over the various criteria must be 



16 



l6 



developed,. This model should capture the 'wholeness end complexity of the 
future learning of interest. And finally seme statistical technioue must 
he selected for estimating future learning, i . e . . scores on the model, as 
a function of the various performance levels to which the segment or segments 
of instruction over which the standard vri31 be set might he learned. That 
performance level whose attainment, will yield the greatest estimated future 
learning is then selected as one’s mastery performance standard. 

While this approach is neither as simple or as quick as its predecessors, 
it does have some powerful advantages. First, it introduces a heretofore 
missing element of objectivity into the mastery standard setting process. 

It forces one to he exp .licit about the decision processes by which he 
arrives at his standards and, thereby, opens the standard setting process 
up to public scrutiny, challenge, and replication. Second, the approach 
yields standards whose attainment has clear meaning in terms of the students* 
future learning. And third, it enables one to optimally design his instruct 
tion by establishing a clear relationship between the three major variables 
which condition how the instruction should proceed: measures of student 
performance, the learning criteria that are chosen to be optimized, and the 
available instructional alternatives. 

Finally, the paper reported an experiment designed to explore the feasi- 
bility of the approach, proposed. Perhaps the most basic assumption which 
underlies this approach is that the performance standard which a student 
attains over each segment of his instruction has important implications for 
his realization of the desired, end- of- instruction learning outcomes. The 
experiment reported was designed to tost this assumption. 



o 

ERIC 



17 



17 



Ninety-one eighth graders ware taught a three -unit sequence iu eievit- 
entary matrix algebra over one school Meets. > The students had been randomly 
assigned to five treatment groups. The control group learned, the algebra 
under no requirement that they maintain any particular per unit performance 
level "while the experimental, groups 3.earned under the requirement that they 
each maintain a different per unit level. The effects of the control, end 
the experimental treatments on selected., end- of- in struct! on cognitive and 
affective learning criteria “were then examined. The cognitive criteria 
were achievement, retention, transfer and learning rate $ the affective cri- 
teria were interest in attitude toward the algebra at and two weeks after 
the instruction's termination. 

In general, the experiment’s results confirmed the assomption tested. 

The maintenance of particular performance levels throughout the instructional 
sequence did have significant effects on student learning as characterized 
by the various learning criteria, further, the maintenance of particular 
levels had different effects oxi different classes of criteria. In particu- 
lar, the maintenance of one level best maximized scores on the cognitive 
criteria while the maintenance of another best msodirdzed scores on the 
affective criteria. 





18 



BXBLICGEAHT; 



Airasian, Peter W. Formative Evaluation Instruments: A Construction and. 

Validation of Tests to Evaluate Learning over Short Time Periods. 
Unpublished Ph. D. Dissertation, University of Chicago, 1969. 

AngofT, W. H. "Scales, Norms and Equivalent Scores," in Educational Measure? 

. . ' meirb .. Second Edition. Edited by H. L. Thorndike. Washington, i>. : C. 

American Council oh Education, 1971 » 

Bhushan, Vidya. Introduction to Matrix Algebra. Honolulu: University 
of Hawaii, Education Research and Development Center, 1968. 

Block, James H. (ed.). . Mastery Learn ing: Theory and Practice . New York: 

Holt, Rinehart, and Winston, IncT, 1971. 

Bloom, Benjamin S. "Learning For Mastery," Evaluation Comment , Vol. 1, 

No. 2 (1968). 

Bloom, Benjamin S, et al. , (eds.). Taxonomy of Educational Objectives , 
H andbook I: Cognitive Domain. New York: David McKay Co., Inc., 

Bloom, B. S., Hastings, T. M. ancl Madaus, G. F, Handqqok on Formative and 
Summative Evaluation of Student Learning . New York? McGraw-Hill, 1971. 

Bock, R. D. "Programming Univariate and Multivariate Analysis of Variance," 
Technometrics , j> (1963) , 95-117* 

Bormuth, John R. On the Theory of Achievement Test Items . Chicago: Uni- 

versity of Chicago Press, 1970. 

. Development of Standards of Readability: Toward, a Rational 

Crlteriou of Passage Performance. Final Report, USDHEVJ, Project 
No. 9-0237. Chicago: The University of Chicago, 1971. 

Brown, George I. Human Teaching for Human Learning . New York: Viking 

Press, 1971* 

Brownell, W. A. "Criteria of Learning in Educational Research," Journal 
of Educational Psychology , 39 (l^S), 170-182. 

Cahen, Leonard* "Comments on Professor Messick’s Paper," in T he Evaluation 
of Instruction : Issues and Problems . Edited by M. C. Wittrock and 

David E. Wiley. Holt, Rinehart and Winston, 1970. 



o 

ERIC 



19 



Carroll, John B. "A Model of School Learning 

64 ( 1963 ), 723 - 733 . 



Teachers C olle ge Record., 



Cronbach, L, J. and Snow, EL E. Individual Differences in Learning Ability 
as a Function of Instructional Variables,, Final Report, USOE, Contract 
Wo. OEC 4~6-06l269~1217 . Stanford, California: Stanford University, 

School of Education, 1969. 

Gagn^f, Robert M. The Conditions of Learning: . New York: Holt, Rinehart, 

and Winston, Inc., 1965. ~ ~ 

Gag n^, Robert M* " Instructional Variables and Learning Outcomes, 11 in Th e 
Evaluation of Instruction : Issues and Problems . Edited by M. C . 

Wittrock and David E. Wiley. New York: Holt, Rinehart and Winston, 

1970. 



Getzels, J. W c "A Social Psychology of Education," in The Hand book of 

Social Psychology , Second Edition, Vol„ 5. Edited by Gardner Lindzey 
and Elliot Aronson, Reading, Mass.: Addison-Wesley Publishing 
Company, 1969. 

Glaser, Robert and Witko, Anthony J. "Measurement in Learning and Instruc- 
tion," in Educational Measurement , Second Edition. Edited by Robert 
L. Thorndike. Washington, D. c77 American Council on Education, 1971. 

Husen, Torsten (ed. ) . International Studjr of Achievement in Mathemtics . 
Volumes 1 and 2, New York: John Wiley" and Sons, 19^7 • 

Krathwohl, D. R., Bloom, B. S., Masia, B. B, Taxonomy o f Educational 

Objectives, Handbook II: Affective Domain. Hew York: David McKay 

I* m ■■ > » > ' » * « nnnownri»n i<i v «»»w» i wM x w m i w w v 

Company, Inc., 1964. 

Messick, Samuel. "The Criterion Problem in Evaluation of Instruction; 
Assessing Possible, Not Just Probable, Intended Outcomes," In The 
Evaluation of Instruction : Issues and Problems . Edited by M. C? 

frittrock and David E. Wiley. New York: Holt, Fvinehart and Winston. 

1970 . 

Hedelsky, Leo. "Absolute Grading Standards for Objective Tests," 
Educational and Psychological Measurement, 14 (1954), 3-19. 

■ — ■■■ 1 <i K mi « — m l «»i •* mmnwStmtm m nmiMnm f PHuO W w m w ■ wwi ww t * 9 9 ~ ' 

Schwab, Joseph J. "The Practical: A language for Curriculum, " School 

Review , ?8 (1969), 1-24. 

Scriven, Michael. "The Methodology of Evaluation," in Perspecti v e s of 
Curriculum Evaluation. Edited by Robert E. Stake. Chicago: Rand 
McNally, 1^77” 



Shaw, Marvj.n E. and Wright, Jack M. Scales for the Mea sure ment of Attitud es 
New York: McGraw-Hill, 19^7 • 

Sherman, J. G. "Application of Rsinf or c easent Principles to a College Course 
Paper presented at the annual meeting of the American Educational 
Research Association, New York, New York, 19^7 » 




TABLE 1 



THE AVERAGE ACHIEVEMENT TEST SCORES AND THE ACHIEVEMENT TEST 
SCORE VARIANCES FOR THE CONTROL AND EXPERIMENTAL GROUPS 



GROUP 


AVERAGE ACHIEVEMENT 
TEST SCORES 
(PERCENT CORRECT) 


VARIANCE OF ACHIEVEMENT 
TEST SCORES 
S 2 


Experimental 


95 percent 


64.9 


82.8 


(n»Xl) 


85 percent 
(n«l4) 


60.7 


110.2 


75 percent 
(n»l4) 


50.8 


139.2 


65 percent 


49.0 


240.2 


(n=12) 


Control 


50.5 


501.8 



(n®25) 



O 

ERIC 



22 



i 



TABLE 2 



THE AVERAGE AMOUNT OP TIME SPENT IN TEXTBOOK LEARNING PER ALGEBRA UNIT 
AND THE AVERAGE FORMATIVE TEST SCORES ON UNIT III BEFORE 
FEEDBACK/CORRECTION FOR THE CONTROL AND EXPERIMENTAL GROUPS 



Group 


AVERAGE TEXTBOOK LEARNING TIME 
(in Minutes) 

Unit I Unit II Unit III 


AVERAGE FORMATIVE TEST 
SCORE UNIT VI 
(Mean Percent correct) 


Experimental 










95 percent 

(b»11) 


11.4 


14.2 


25.8 


74.4 


85 percent 

(n=*l4 


11.4 


15.0 


29.4 


63.4 


75 percent 

(n*l4) 


11.2 


15.2 


29.1 


56.5 


65 percent 

(n*12) 


n.i 


14.3 


27.1 


63.7 


Control 

(n«25) 


lloi 


12.7 


25.3 


54.2 



O 



23 




COKTROi- 

GROUP 



group 



group 

(n ia 14) 



GROUP GROUP 



(i;«55) 

** 1 " The n vats Xi, fas/. I:h«a . y*t;«afcion' »«r«rs gtora, 
b 



J!ke, nag a a a_C«Ka^ 



O 

ERIC 



24 






AVERAGE PER CENT CORRECT 



i 



r 



70 



ts 4 



50 



40 



30 



i 



! 



( 



s 

i 










50% 

CONTROL 

oaou? 

(n« 25 ) 



“ 65 % 
GROW 



75% 

aa» " 

(in 43 14) 



05 % 

GROW 

(«®14) 



957. 

GROUP 

<n*U) 



i 

: :?i«$ : a. ^*»Av«3pas© transfer Scores 

for t£> dontrol. and Experimental groups. 




j 

i 






* 



25 



AVESAi'F TOTAL T * ME IN M' MUTES 




GROUP 
(n«l 1 ) 



Fig., $. “"Av^rege ®otfcl learning ®lme 
for the Control and f xperte**? tal ^roiapa. 



40 



$87. ‘ 
CONiROL 
GROUP 
(n»25) 






~w~ 

GROUP 

(n*l2) 



/3/V 
GROUP 
(n«14 ) 



“T 5 T“ 

GROUP 

(a“14) 



£6 

o 

ERIC 









Zi i 



24 



23 



22 



21 



./ 



/ 



/ 



/ 



A . 
/ \ 



\ 



\ 



\ 



Attitude 



m 

u 

u 

cn 

stj 

u 

3 



20 



19 



18 



17 



16 



15 



14 



X 



X 






y 



X 



X 



y 



X 



X" 



x’**" 







X 






X 



Interest 



a* 

507. 

CXJNTROl 

GROUP 

(n«25) 



65X 

GROUP 

(rv®ll) 



75% 

GROUP 

(n«14) 



857, 

GROUP 



957. 

GROUP 

(ev*11) 



yifl 5 Avanr-sg© s,X altitude isoocrcse 

lo» Xj «aS «5?«WS>8- 



o 

ERIC 



28 



